Flexible protein sequence patterns. A sensitive method to detect weak structural similarities.

نویسندگان

  • G J Barton
  • M J Sternberg
چکیده

The concept of a flexible protein sequence pattern is defined. In contrast to conventional pattern matching, template or sequence alignment methods, flexible patterns allow residue patterns typical of a complete protein fold to be developed in terms of residue positions (elements), separated by gaps of defined range. An efficient dynamic programming algorithm is presented to enable the best alignment(s) of a pattern with a sequence to be identified. The flexible pattern method is evaluated in detail by reference to the globin protein family, and by comparison to alignment techniques that exploit single sequence, multiple sequence and secondary structural information. A flexible pattern derived from seven globins aligned on structural criteria successfully discriminates all 345 globins from non-globins in the Protein Identification Resource database. Furthermore, a pattern that uses helical regions from just human alpha-haemoglobin identified 337 globins compared to 318 for the best non-pattern global alignment method. Patterns derived from successively fewer, yet more highly conserved positions in a structural alignment of seven globins show that as few as 38 residue positions (25 buried hydrophobic, 4 exposed and 9 others) may be used to uniquely identify the globin fold. The study suggests that flexible patterns gain discriminating power both by discarding regions known to vary within the protein family, and by defining gaps within specific ranges. Flexible patterns therefore provide a convenient and powerful bridge between regular expression pattern matching techniques and more conventional local and global sequence comparison algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.

This paper presents a novel approach to profile-profile comparison. The method compares two input profiles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our profile-profile comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments...

متن کامل

A sequence property approach to searching protein databases.

Currently available sequence alignment programs are generally not capable of detecting functional and structural homologs in the twilight zone of sequence similarity, i.e. when the sequence identity falls below about 25%. Here we attempt to detect such weak similarities using an approach based on a notion of protein sequence similarity radically different from that used in sequential alignment....

متن کامل

Relationship between protein structures and disulfide-bonding patterns.

We found that that disulfide-bonding patterns can be used to discriminate structure similarity. Our method, based on the hierarchical clustering scheme, is applicable to proteins with two or more disulfide bonds and is able to detect the structural similarities of proteins of low sequence identities (<25%). Our results show the surprisingly close relationship between disulfide-bonding patterns ...

متن کامل

Solenoid and non-solenoid protein recognition using stationary wavelet packet transform

MOTIVATION Solenoid proteins are emerging as a protein class with properties intermediate between structured and intrinsically unstructured proteins. Containing repeating structural units, solenoid proteins are expected to share sequence similarities. However, in many cases, the sequence similarities are weak and non-detectable. Moreover, solenoids can be degenerated and widely vary in the numb...

متن کامل

Database of patterns PROF_PAT, used to detect local similarities

Resume Motivation: When analysing novel protein sequences, it is now essential to extend search strategies to include a range of 'secondary' databases. Pattern databases have become vital tools for identifying distant relationships in sequences, and hence for predicting protein function and structure. The main drawback of such methods is the relatively small representation of proteins in trial ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of molecular biology

دوره 212 2  شماره 

صفحات  -

تاریخ انتشار 1990